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Abstract 

Background: Data normalization is a key step in gene expression analysis by qPCR. Endogenous control genes are 
used to estimate variations and experimental errors occurring during sample preparation and expression 
measurements. However, the transcription level of the most commonly used reference genes can vary considerably 
in samples obtained from different individuals, tissues, developmental stages and under variable physiological 
conditions, resulting in a misinterpretation of the performance of the target gene(s). This issue has been scarcely 
approached in woody species such as grapevine. 

Results: A statistical criterion was applied to select a sub-set of 19 candidate reference genes from a total of 242 
non-differentially expressed (NDE) genes derived from a RNA-Seq experiment comprising ca. 500 million reads 
obtained from 14 table-grape genotypes sampled at four phenological stages. From the 19 candidate reference 
genes, VvAIGl (AvrRpt2-induced gene) and VvTCPB (T-complex 1 beta-like protein) were found to be the most stable 
ones after comparing the complete set of genotypes and phenological stages studied. This result was further 
validated by qPCR and geNorm analyses. 

Conclusions: Based on the evidence presented in this work, we propose to use the grapevine genes VvAIGl or 
VvTCPB or both as a reference tool to normalize RNA expression in qPCR assays or other quantitative method 
intended to measure gene expression in berries and other tissues of this fruit crop, sampled at different 
developmental stages and physiological conditions. 



Background 

Quantitative real-time PCR (qPCR) is generally used for 
measuring transcripts abundance due to its high sensi- 
tivity, specificity and broad quantification range for high 
throughput and accurate expression profiling of selected 
genes [1]. Also, qPCR analysis has become the most com- 
mon method for verification of microarrays and RNA-Seq 
results [2-4]. Besides being a powerful technique, qPCR has 
certain disadvantages such as the difficulties associated 
to the inappropriate data normalization, one of the most 
important aspects to solve [5] in order to fit this technique 
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for the study of a new organism, organ or tissue. The data 
normalization is a key stage to control the artifacts and ex- 
perimental error occurring during sample preparation and 
the following experimental steps, ending with the data 
analysis. It has been shown that qPCR results are highly 
dependent on the reference genes chosen [6], which explain 
the considerable effort applied into the validation of the 
gene(s) selected for the normalization stage, prior to exten- 
sive experimentation [7]. These housekeeping genes should 
not vary in their expression level considering the different 
tissues or cells under investigation, nor in response to any 
experimental treatment [8]. 

Regardless of the experimental technique employed, ap- 
propriate normalization is essential for obtaining accurate 
and reliable quantifications of gene expression levels, 
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especially when measuring small expression differences or 
when working with tissues of different histological origin 
[9]. The purpose of normalization is to correct variability 
associated with the various steps of the experimental 
procedure, such as differences in initial sample amount, 
RNA extraction recovery and integrity, efficiency on cDNA 
synthesis and differences in the overall transcriptional activ- 
ity of the tissues or cells analyzed [10]. Among the numer- 
ous normalization approaches that have been proposed 
[11,12] the use of internal controls or reference genes has 
become the method of preference [13,14], because they 
potentially account for all of the sources of variability 
mentioned above. However, numerous studies have reported 
that the transcript quantity of the most commonly used 
reference genes can vary considerably under different de- 
velopmental, physiological and experimental conditions 
[11,15-23]. Several reference genes are commonly used, 
such as elongation factor [24,25], actin [26,27], ubiquitin 
[28,29], and ribosomal units (18S or 28S rRNA) [30-32]. 
However, several reports have demonstrated that transcript 
levels of these genes also vary considerably under dif- 
ferent experimental conditions and consequently their 
suitability for gene expression studies must be evaluated 
case by case [22,33,34]. This implies that a reference 
gene with stable expression in one organism may not be 
suitable for normalization of gene expression in another 
organism [35,36], or even in different experiments for 
the same species. 

Many works have been carried out on animal models 
and in relation to human health [37,38], fields in which 
multiple reference genes for normalization of qPCR data 
have been described. However, similar reports are less 
abundant in plants [10,35,39]. Czechowski et al. [22] 
employed a new strategy for the identification of reference 
genes in Arabidopsis thaliana, based on the microarray 
data of Affymetrix (ATH1), and several new reference genes 
were revealed [40]. This list of Arabidopsis reference genes 
was successfully employed to search for reference genes 
by sequence homology in unrelated species such as Vitis 
vinifera [7]. This approach resulted in a strategy that is 
based on the parallel use of a series of control genes and 
calculation of normalization factors using statistical algo- 
rithms [8,11,41]. It is necessary to validate the expression 
stability of a candidate control gene in each experimental 
system prior to its use for normalization. In this regard, 
several free software applications such as geNorm [8], 
NormFinder [42] or qBase [43] are used in order to iden- 
tify the best internal controls from a group of candidate 
normalization genes in a given set of biological samples. 

To our knowledge, no investigations have been yet car- 
ried out for the identification of reference genes in table 
grape, one of the most important template fruit crops. In 
this work we used a data set obtained from a large RNA- 
Seq experiment of table grape segregants phenotypically 



and genetically diverse, belonging to a 'Ruby Seedless' x 
'Sultanina crossing, sampled at three phenotypic stages, 
anthesis, fruit-setting and berries of 6-8 mm diameter 
(the last one from plants treated or not with gibberellic acid). 
We focused the search of control genes evaluating the 
variability (or stability) in the expression of 19 genes 
selected from an initial set of 242 genes that showed a 
threshold stability level, comparing the four different 
developmental and physiological conditions. Two new 
reference genes, VvAIGl (AvrRpt2-induced gene) and 
VvTCPB (T-complex 1 beta-like protein) were validated 
by qPCR and geNorm techniques and are presented as 
new housekeeping genes for table grape. 

Results and discussion 

Identification of putative reference genes 

Usually the search for reference genes in any plant species 
is based on the identification of orthologs of genes sta- 
bly expressed in model plants, mainly from Arabidopsis 
thaliana [44,45]. In this case, we used our own information 
obtained from a massive sequencing assay done with 47 
samples of the same species of interest, i.e., table grape 
(Vitis vinifera L.). This set of samples corresponds to 14 
genotypes from which RNA was collected, combining 
different flower and berry developmental stages and 
treatments (see Methods section). Even when the main 
outcome of an RNA-Seq exercise is the identification of 
differentially expressed genes, in this case the same data 
set was used to search for putative reference genes, con- 
sidered as that any gene that has a minimal expression 
level variation in every sample analyzed. Based on these 
criteria, a total of 242 candidate housekeeping genes were 
identified, using the bioinformatics workflow presented in 
Figure 1. These genes are involved in different biological 
processes (data not shown), such as synthesis, degrada- 
tion, folding, defense, stress and catabolism of proteins 
and metabolites. As this number of genes is too large to 
evaluate each one respect of their transcriptional stability, 
we selected a subset according to statistical criteria de- 
scribed in the next section. With this purpose in mind, we 
ranked the list of 242 genes according to their coefficient 
of variation (CV), even when it was not observed a direct 
relation between CVand total reads (Additional file 1: 
Table SI). 

Selection of a sub-set of 19 candidate reference genes 

Different approaches, such as Poisson distribution, quasi- 
Poisson distribution and negative binomial distribution, 
have been used to represent the statistical distribution of 
sequence data [46-48]. Under these three kinds of distribu- 
tion, the mean of count reads is highly related to variance 
[47,48]. A summary of the three statistical parameters used 
in this work to characterize the NDE genes is shown 
in Table 1. The mean and the variance were high and 
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Figure 1 Bioinformatics pipeline used for the identification of the putative reference genes obtained through a high-throughput 
sequencing of cDNA (RNA-Seq). 



positively related, while the average was not related to the 
CV (Table 2). Therefore, this last parameter, together with 
the mean, was used to select those NDE genes that behaved 
as housekeeping and had both a low variation coefficient 
and a high abundance along the 47 samples analyzed. The 
data (Table 1) showed a high estimated coefficient of 
variation (CV> 40%, with a range of -25%). This variation 
probably could be given by an intrinsic variation within 
the biological sample (phenotype, phenological stages and 
gibberellic acid treatment) or by sampling error, because 
the sources of variation are considered during the selection 
of genes as differentially expressed by edgeR package [46]. 
Only a few genes (-8% of total NDE genes) had mean 
and CV values large enough to rule out sampling errors. 
This among-samples variation could be explained because 
the genotypic effect was not taken into account for the 
selection of NDE genes. In this study, each one of the 
14 genotypes used could be differentially interacting with 
the other factors or conditions (phenotype, phenological 
stages and gibberellic acid treatment). 

Because of the difficulties to find genes that possess sim- 
ultaneously both a high expression level (number of reads) 
and a low CV, we used the coefficient of variation, which is 
not related to the mean and it is easier to interpret (Table 2). 



Table 1 Descriptive statistics of mean (u), variance 
(a 2 ) and coefficient of variation (CV) of the 242 
non-differentially expressed genes 





Min. 


<7i 


Median 


<73 


Max. 




331.7 


680.2 


820.7 


1129 


3682 


o 2 


41300 


120000 


189400 


326900 


3636000 


CV 


0.444 


0.492 


0.515 


0.544 


0.648 



Min.: minimum value; ql: first quartile; q3: third quartile; Max.: maximum value. 



This parameter has been previously used in other experi- 
ments [49,50]. The threshold estimated by the simulation 
for CVand p are listed in Table 3. According to this, as the 
threshold became more stringent, fewer genes were found 
that satisfied both criteria of selection: CV < percentile 
threshold and [i > percentile threshold (Table 3). Only 19 of 
the 242 NDE genes satisfied both criteria at 97.5% and 2.5% 
for [i and CV, respectively (Table 4). 

Primers design and analysis of the variability from 
threshold cycles value 

Primer pairs for qPCR were designed and subsequently 
evaluated on table grape cDNA. For 17 out of the 19 
primer pairs designed, a single amplicon was observed 
by electrophoretic separation; each amplicon was sequenced 
to confirm the primer specificity. The primers for VvADH7 
and VvSLP had to be excluded from the study as they pro- 
duced two amplicons under the tested PCR conditions. All 
primers were designed with the following criteria: 20-24 bp 
length, GC content between 50% and 65%, product size in 
the range of 91-268 bp and melting temperature between 
60-64°C (Table 5). Melting curve analyses of the 17 
genes showed a single peak in each case, confirming that 
the primers amplified a single product (data not shown). 
Except for VvUNP3 (129%) and VvADF2 (114%), all PCRs 



Table 2 Relationship among statistical parameters of 
read counts of non-differentially expressed genes 
(Pearson's correlation coefficient) 







a 2 


o 2 


0.951*** 




CV 


0.029 n.s. 


0.126*** 



Significance codes: p value <0.001 '***'0.00 1-0.01 '**'0.01-0.05 '*'> 0.05 n.s. 
(non-significant); jU mean; o2 variance; CV coefficient of variation. 
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Table 3 Threshold used as criteria of selection based on 
the distribution of coefficient of variation {CV) and of 
mean (u) of the 10,000 simulated genes 



Percentile 


Threshold 


n 


CV 5% 


0.536 


34 


fi 95% 


1161.3 




CV 2.5% 


0.513 


19 


{1 97.5% 


1208.2 




CV1% 


0.487 


4 


jL/ 99% 


1260.5 




CV 0.1% 


0.43 


0 


li 99.9% 


1369.3 





n: Number of NDE genes that satisfied both criteria of selection: CV < 
percentile threshold and u > percentile threshold. 



displayed amplification efficiencies between 83% and 110% 
(Additional file 2: Table S2). 

As a first approach we compared the different expression 
levels of the reference genes over all the 47 samples using 
the absolute Ct value. Analysis of the raw expression levels 
across all samples detected some variation among reference 
genes. The results (Additional file 2: Table S2 and Figure 2) 
revealed that all genes presented median ct values be- 
tween 18.5 and 24.8 and the CV was < 7% for all the 
reference genes (Additional file 2: Table S2), among which 



VvAIGl and VvTCPB presented the lowest CVs, 3.6 and 
3.9 respectively. 

Expression analysis of reference genes for qPCR 

Using quantitative Real-Time PCR we studied the expres- 
sion of 12 out of 19 candidate reference genes in cDNA 
samples of table grape genotypes from different pheno- 
logical stages. Most of the genes showed a similar expres- 
sion pattern considering the different samples under study, 
e.g., lower expression at anthesis and fruit-setting stages 
and slightly higher expression in the 6-8 mm berry size 
stage (Figure 3, C-L). Other genes such as VvAIGl and 
VvTCPB did not show significant differences in their ex- 
pression along the different phenological stages and in the 
different samples (segregants) studied (Figure 3, A and B). 
As a control, we included three genes studied by Reid et al. 
[7], VvlIBQlO, VvPIP2B and VvEFl-a, which presented an 
expression profile similar to the set of putative reference 
genes, with appreciable differences between phenological 
stages (Figure 3, M-O). Interestingly, this set of three con- 
trol genes, commonly used in gene expression studies in 
grapevine exhibited very "unstable", non-uniform or too- 
low expression levels, and so they were not included in 
the list of 242 genes initially selected, and consequently 
they are not recommendable to be used as reference genes 
in table grapes. 



Table 4 Candidate reference gene ranking according to their CV 



Genoscope ID 


Total reads 


Mean ¥ 


SD 




CHR 


Product 


GSVIVG01 0361 66001* 


80103 


1704 


791 


0,46 


chr6 


Vacuolar protein sorting-associated protein 4 


GSVIVG01 01 3003001* 


57177 


1217 


571 


0,47 


chr2 


26S proteasome non-ATPase regulatory subunit 13 


GSVIVG01 027659001* 


63133 


1343 


635 


0,47 


chr15 


Unkown protein function 


GSVIVG01 025947001* 


64396 


1370 


657 


0,48 


chr18 


Protein AIG1 


GSVIVG01 03581 4001* 


79018 


1681 


818 


0,49 


chr4 


Unkown protein function 


GSVIVG0 1038268001* 


162669 


3461 


1689 


0,49 


chr5 


Rab GDP dissociation inhibitor alpha 


GSVIVG0 1008708001* 


90480 


1925 


941 


0,49 


chr18 


T-complex protein 1 subunit beta 


GSVIVG01 028520001* 


96994 


2064 


1009 


0,49 


chr7 


26S protease regulatory subunit 4 homolog 


GSVIVG01 01 2792001* 


56443 


1201 


588 


0,49 


chr18 


Putative peptidase 


GSVIVG01 031 067001 


67287 


1432 


705 


0,49 


chr14 


T-complex protein 1 subunit zeta 


GSVIVG01 033771 001 


83853 


1784 


883 


0,49 


chr8 


Splicing factor U2af small subunit A 


GSVIVG01 0331 72001 


78058 


1661 


822 


0,50 


chr4 


Serine/Arginine-rich splicing factor 7 


GSVIVG01016731001 


69545 


1480 


734 


0,50 


chr9 


Proteasome subunit alpha type-6 


GSVIVG0 1028854001 


82587 


1757 


875 


0,50 


chr16 


40S ribosomal protein S10-1 


GSVIVG0 1033442001* 


63350 


1348 


673 


0,50 


chr8 


Carbon catabolite repressor protein 4 homolog 2 


GSVIVG01 03781 4001* 


70685 


1504 


754 


0,50 


chr3 


Unkown protein function 


GSVIVG01 01 5062001* 


59049 


1256 


637 


0,51 


chrl 1 


Aldehyde dehydrogenase family 7 member A1 


GSVIVG01 03021 5001* 


155807 


3315 


1682 


0,51 


chr8 


Proactivator polypeptide-like 1 


GSVIVG01 01 6593001* 


101022 


2149 


1091 


0,51 


chr13 


Actin-depolymerizing factor 2 



SD standard deviation; CV coefficient of variation; CHR chromosome location for each gene. 

*Genes studied in this work; *genes that showed double amplicon. 

^Threshold mean 1208.2 (percentile 97.5%). 

threshold coefficient of variation 0.513 (percentile 2.5%). 
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Table 5 List of primers designed for the 19 candidate reference genes considered in this study 



Genoscope ID Gene abbreviation GenBank accession 



Primer sequence (5-3') 



Product size (bp) TM (°C) 



GSVIW01038268001 


VvRABI 


XM_002280570 


F: GCAAGGCTCAGTGCTGmA 
R: TOGGATOGGTGGCTCATA 


217 


60 


GSVIVT01030215001 


VvPPl 


XM_002268545 


F: GAGCCAGGAATCCACAAAGAC 
R: AGAACCGACCAAACCCAAACT 


166 


62 


GSVIVT01016593001 


VvADF2 


XM_002284004 


F: GGCCmGTCGCTGmCCT 
R: AGTGGGCTCACCAACC^ 


268 


60 


GSVIVT01 028520001 


VvPR26S 


XM_002263298 


F: GAGCAAGTOAAGCCGCAGGAG 
R: CCCACGGACGACGACACGAT 


138 


62 


GSVIVTO 1008708001 


VvTCPB 


XM_002285876 


F: AGACAGTGATOACAGCCGAG^ 
R: ATCCCTGCGTGGCmOTCC 


238 


64 


GSVIVT01033771001 


VvSFU2 


XM_002277409 


F: CCCCACCCTCCTCCmCCAAC 
R: TGGTCAGCCAAATOTCACAGA 


192 


64 


GSVIW01028854001 


VvPR40 


XM_002273250 


F: GA^GTGCCTGCCACCITGA 
R: AACCTCCACCTCCTCGTCCA 


257 


62 


GSVIVT01036166001 


VvSAP4 


XM_002262726 


F: AGCCTAATGTGAAGTGGAGC 
R: AACAGCOTGGCTAGGTATG 


179 


60 


GSVIVT01035814001 


VvUNP2 


XM_002284964 


F: AGATACAGAGGCAGGAGAAGT 
R: AGAATOGGAATCCAGTGAGG 


214 


64 


GSVIVT01033172001 


VvSF7 


XM_002272621 


F: GAGCGAGAAC1TGAAGATGAG 
R: CAAACGGCATOACGGGCAAA 


258 


62 


GSVIVT01037814001 


VvUNP3 


FQ387200 


F: ACGCTCCTCAGTACGGTCAG 
R: AGAGCAGCCAAACATCOTC 


91 


60 


GSVI\7T01016731001 


VvPSA 


XM_002271893 


F: ATGGACCTCGCCTOTCAAAT 
R: TCCTCGGTGGACAACACTCTG 


262 


62 


GSVIVT01031067001 


VvTCPZ 


XM_002283474 


F: CTOTGAAACAATCAGAACGCTAC 
R: TCAGGCTCATCACCCATOCCA 


140 


62 


GSVIVT01 025947001 


VvAIGl 


XM_002281960 


F: GAAGATOmGGGCCGTGAG 
R: OTOTGGOTCATCOTGGT 


108 


62 


GSVIVTO 1033442001 


VvCCRP 


XM_002280954 


F: TOGmGAAGTOGACGCTCTA 
R: AGTGACGAGGAGTAGGTGAGG 


173 


64 


GSVIVTOI 027659001 


VvUNP 


XM_002280576 


F: TCGGACOTCGGATOGCAT 
R: CACTCCAGTGGGTAGCATAG 


227 


60 


GSVIVT01015062001 


VvADH7 


XM_002278057 


F: TCCGGCGAATCCTGGATG™ 
R: CCGTCACCACCGCAATCCTCT 


104 


64 


GSVIVT01013003001 


VvPRN26S 


XM_003631440 


F: GAAGCTCTGGCACCAACTCACT 
R: ACTGCCTAGAAACTATGACAGCAA 


158 


64 


GSVIVT01012792001 


VvSLP 


FQ388031 


F: GCCGTCCACATCAmACACT 
R: AGCOTOTGGCAGCCTCCTC 


108 


62 



F forward primer. R reverse primer, bp base pairs. TM melting temperatura given in °C. 



Validation of candidate reference genes 

For the validation of VvAIGl and VvTCPB as reference 
genes, we studied their expression profile also in more 
advanced phenological stages (pre- veraison and post- 
veraison), using cv. Sultanina as a model table grape 
genotype. Some authors as Gamm et al [34] and Artico 



et aL [23] among others, recommend that the ideal ref- 
erence genes should be expressed at a constant level 
throughout the plant tissues, developmental stages or 
physiological conditions, and not be influenced by exogen- 
ous treatments but no one gene has such a stable expres- 
sion under every experimental condition, as numerous 
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Figure 2 Variability of threshold cycles (Ct) value in each reference gene among all tested samples. A line across the box indicates the 
median. The box indicates the 25th and 75th percentiles. Whiskers represent the maximum and minimum values. 



studies reported that expression of housekeeping genes 
can also vary considerably under particular experimental 
conditions as it is observed in the Figure 4. VvAIGl 
and VvTCPB genes neither presented significant differ- 
ences in their expression at the growing stages evaluated 
(Figures 5 A and 5B, respectively). Similar results for these 
two genes were observed in cvs. Red Globe, Crimson 
seedless and Muscat of Alexandria, a set of genotypes 
representing at some extent the genetic diversity of table 
grapes [51]. In addition, we evaluated the performance 
of these two reference genes in leaves with similar results 
as berries (data not shown). 

To complement this, we used geNorm algorithm to 
determine the most stable reference genes assuming that 
two ideal reference genes should not vary in comparison 
with each other in the different tested conditions. This 
algorithm calculates the average pair wise variation of a 
given candidate reference genes set with all other genes 
under evaluation and assigns a measure of its expression 
stability (M), based on which a ranking of candidate ref- 
erence genes is produced [8]. The geNorm software has 
been cited for many authors in relation to the identification 
or behavior of reference genes; this is because of its 
easiness, robustness, reliability and convenience of use, 
and so it is currently included in qRT-PCR analyses in 
animals, yeasts, bacteria but rarely in plants [52]. Our 
results based on geNorm were consistent with this couple 
of genes being very stable regarding gene expression in 
the analyzed samples. 

For anthesis, the two most stable genes were VvAIGl and 
VvCCRP (Figure 4A); in the case of fruit-setting these were 
VvlINP and VvCCRP (Figure 4B); and for 6-8 mm berries, 
the most stable genes were VvlINP and VvAIGl (Figure 4C). 
Other genes considered in this work (EF, PP2A and UBQ10) 



were studied in other species of plants such as soybean [53] 
and Gossypium hirsutum [23], showing a high variability 
in their expression profile depending of the physiological 
condition, tissues and genotypes. 

In summary, the most stable reference genes for all 
samples studied (different genotypes evaluated at dif- 
ferent phenological stages) were VvTCPB and VvAIGl 
(Figure 4D). These results demonstrate that our approach 
allowed us to obtain a set of genes that could be used as 
reference genes in qPCR experiments; this is similar to 
the result obtained by Coito et al. [40], where they proved 
the accuracy of choosing a combination of grapevine 
reference genes for qPCR, but in that case through a 
microarray analysis. 

Conclusions 

This work is the first study that shows that a data set 
derived from a massive RNA sequencing for several indi- 
viduals and phenotypic conditions can be used for the 
identification of housekeeping genes in a non-model plant 
species such as grapevine. The genes VvTCPB and VvAIGl, 
never cited before as possible reference genes in this 
or other woody species were the most stable genes in 
all samples studied. Then, these genes are proposed as 
reference genes to be used in qPCR assays in table 
grape berries at different developmental stages and 
physiological conditions. 

Methods 

Plant material 

Twelve table grape segregants belonging to a 'Ruby seedless' 
x 'Sultanina crossing of contrasting and extreme phenotypes 
respect of seed content and berry size plus both parents 
were used in the RNA-Seq experiments (Munoz et al, 
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(See figure on previous page.) 

Figure 3 qPCR expression values for candidate reference genes in grapevine samples. Two segregants from the Ruby x Sultanina crossing 
(1 12 and 19) in three phenological stages (anthesis, fruit-setting and 6-8 mm berries) treated or not with gibberellic acid (GA3) were used. These 
segregants represent extreme phenotypes for berry size. For relative expression the genes were normalized with the lowest expression gene. A, 
AIG1 {VvAIGI); B, T-complex protein 1 subunit beta {VvTCPB); C, vacuolar sorting-associated protein 4 {VvSAP4); D, 26S proteasome non-ATPase 
regulatory subunit 13 {VvPRN26S); E, carbon catabolite repressor protein 4 homolog 2 {VvCCRP); F, unkown protein function {VvUNP2); G, unkown 
protein function {VvUNP); H, unkown protein function {VvUNP3); I, Rab GDP dissociation inhibitor alpha {VvRABI); J, proactivator polypeptide-like 1 
{VvPPl); K, acting-depolymerizing factor 2 {VvADF2); L, 26S protease regulatory subunit 4 homolog {VvPR26S). Other putative housekeeping genes 
reported and used in many works are the following: M, polyubiquitin (VvUBQW, GenBank acc CB977307); N, plasma membrane intrinsic protein 
2B {VvPIP2B, GenBank acc EC969993); and O, elongation factor 1 -alpha {VvEF]-a, GenBank acc CB977561). Bars in the graphs correspond to 
standard error (SE) from three biological samples, assayed in duplicate. Different letters represent significant differences a t P < 0.05 by LSD test. 



manuscript in preparation). For RNA-Seq analyses, a 
number of whole berries from each condition (for a list 
of samples, phenological stages, etc., see Additional file 3: 
Table S3) was frozen in liquid nitrogen, homogenized and 
their RNA was sequenced after converted to cDNA, obtain- 
ing ca. 500 million reads from 47 sequenced samples. 

For the qPCR validation of the 19 candidate reference 
genes, two genotypes from the same crossing collected 
at three phenological stages (anthesis, fruit-setting and 
6-8 mm berries, treated or not with gibberellic acid) were 
used. We also included samples of 'Sultanina collected at 
more advanced phenological stages (pre-veraison and post- 
veraison). The vines, established at La Platina Experimental 
Station of the 'Institute de Investigaciones Agropecuarias,' 
located in Santiago, Chile, were maintained under a stand- 
ard management program for watering, fertilization, pests 



and diseases control and pruning. After harvest, every 
sample was immediately frozen in liquid nitrogen and 
stored at -80°C until use. 

Public data used 

The reference grape genome (12X) and the gene annota- 
tion were downloaded from the GENOSCOPE database 
(http://www.genoscope.cns.fr/externe/GenomeBrowser/Vitis/). 
The reference genome contains a total of 26,346 annotated 
transcripts with an average size of 1,137 base pairs. 

Identification of candidate reference genes 

To build the RNA-Seq data-base, a total of 491 million 
reads were generated in a Genome Analyzer II, from 
Illumina (IGA, Udine, Italy). After the quality trimming, 
477 million reads were kept, and 91% of them were located 
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Figure 4 Expression stability values (M) and ranking of 14 candidate housekeeping genes as calculated by geNORM algorithm. Average 
expression stability value (M) of the candidate genes was measured during stepwise exclusion of the least stable candidate genes. Genes with 
the lowest M values have the most stable expression. Twenty-four cDNAs corresponding to different phenological stages were used: A, anthesis; 
B, fruit-setting; C, 6-8 mm berries; and D represents all the phenological stages. 
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Fruit-Setting 6-8mm V-2 Veraison V+2 Fruit-Setting 6-8mm V-2 Veraison V+2 



Figure 5 Validation by qPCR of two putative reference genes in cDNA from 'Sultanina' samples. We selected 20 samples from five 
different phenological stages: before (fruit-setting, 6-8 mm berry size and V-2) and after (V + 2) veraison (V). A, AIG1 {VvAIGl) and B, T-complex 
protein 1 subunit beta {VvTCPB). Bars in the graphs correspond to standard error (SE) from four biological samples, Different letters represent 
significant differences at P < 0.05 by LSD test. 



in the reference grape genome by TOPHAT [54] program. 
The differential expression test on seventy comparisons 
was implemented in the edgeR [46] software. Then, using 
in-house development scripts, we searched for genes 
that were classified as non-differentially expressed, and 
presented at least 100 reads in each sample/condition 
and a low variation index among conditions. Finally, all 
these steps were executed as a bash pipeline (Figure 1). 

Derivation of the statistical test for the selection of 
reference genes 

As a first approximation to identify the reference genes, 
it was used as criteria the mean of read counts and the 
coefficient of variation (CV = standard deviation/mean) 
among the 47 different conditions for each of the 242 
non-differentially expressed genes (NDE). The relation- 
ship between these two criteria was analyzed by Pearson s 
correlation coefficient (r) using R 2.15.0 [55]. The CV has 
been previously used for this purpose in cereal crops 
[49,50]. In order to find those genes having both a high 
number of reads and a low variation coefficient among 
samples from different phenological stages and conditions, 
pseudo data sets were simulated by resampling of the ori- 
ginal data. The purpose was that the stability (low CV) 
and level of expression (high mean values of read counts) 
were due to features of the gene and not to random or 
experimental error. The procedure was performed as fol- 
lows: for each original gene we calculated the mean and 
the CV of the read counts among the different conditions. 
Then, a pseudo set of data was simulated representing a 
pseudo NDE gene under the 47 conditions. To represent 
this gene, 47 read counts were sampled at random from 
the original data matrix (247 x 47 observations) and then 
both the mean and CV were calculated for this pseudo 
NDE gene. Thus, 10,000 pseudo NDE genes were simu- 
lated. Then the 10,000 pseudo- values of the mean and CV 
were sorted from the lowest to the highest values. The 
highest 9,750-th value (percentile: 97.5%) and the lowest 



250-th value (percentile: 2.5%) of mean and CV, respec- 
tively, were used as thresholds of selection. Finally, only 
those genes that had both a mean of read counts above 
and a CV below the corresponding thresholds were selec- 
ted. This algorithm was programmed using R 2.15.0 [55]. 

RNA isolation and cDNA synthesis 

Total RNA was isolated from 3-4 g of frozen tissue using 
the modified hot borate method [56]. The quantity and 
quality of the RNA were assessed by measuring the A260/280 
ratio and by electrophoresis on a 1.2% formaldehyde-agarose 
gel. First strands of cDNA were obtained by reverse tran- 
scription reactions with 2 ug of total RNA as template, using 
MMLV-RT reverse transcriptase (Promega, Madison, WI) 
and oligo dT primers according to standard procedures. The 
concentration of cDNA was assessed by measuring the ab- 
sorbance at 260 nm, finally diluting each cDNA to 50 ng/ul 
prior to use in qPCR. Quality and quantity of cDNA was also 
determined by using a Bioanalyzer (Agilent Technologies, 
Santa Clara, CA), with equivalent results. 

Primer design 

Gene-specific primers were designed using Primer Premier 
5.0 software (Premier Biosoft International, Palo Alto, CA) 
and synthesized by Alpha DNA (Montreal, Quebec, Canada). 
The nucleotide sequences were obtained from a private 
data-base maintained at http://vitisdb.cmm.uchile.cl/. In 
addition, three genes encoding a polyubiquitin (UBQ10), 
plasma membrane intrinsic protein 2B (PIP2B) and elong- 
ation factor 1 -alpha (EF-la) and their respective pairs of 
primers were selected from previously published reports 
[28] and evaluated as a way of comparison. Accession 
numbers, primer sequences, expected size of amplicons 
and melting temperature are provided in Table 5. 

Quantitative real-time PCR assays (qPCR) 

Each transcript abundance was analyzed by real-time 
PCR with the LightCycler Real-Time PCR System (Roche 
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Diagnostics, Mannheim, Germany), using SYBR Green™ as 
a fluorescent dye to measure the amplified DNA products 
derived from RNA. Three biological samples in duplicate of 
quantitative PCR experiments were performed for each 
sample as described in Garcia-Rojas et al [57]. Briefly, 
the amplification reaction was carried out in a total 
volume of 20 ul containing 1 pmol of each primer, 
5 mM MgCl 2 , 1 ml LightCycler™ DNA Master SYBR® 
Green I (Roche Diagnostics) and 100 ng of each cDNA 
analyzed. The thermal cycle conditions were: denaturation 
at 95°C for 10 min, followed by 35 three-step cycles of tem- 
plate denaturation at 95°C with a 2 s hold, primer annealing 
at 60-65°C for 15 s and extension at 72°C for 25 s. Fluores- 
cence data was collected after each extension step. Melting 
curve analyses were performed and checked for single 
peaks, and the amplification product sizes were confirmed 
in an agarose gel to ensure the absence of non-specific PCR 
products. Fluorescence was analyzed using LightCycler™ 
Analysis Software (Roche Diagnostics). The crossing point 
for each reaction was determined using the Second Deriva- 
tive Maximum algorithm and manual baseline adjustment. 

Determination of reference gene expression stability 

Expression levels of each one of the 19 candidate reference 
genes in all samples were determined by assessing the 
number of threshold cycles (Ct) needed for the amplifica- 
tion related fluorescence to reach a specific threshold level 
detection. Ct values were transformed to quantities using a 
standard curve which is a requirement for using geNorm. 
To manage the large number of calculations generated, we 
used a Visual Basic Application (VBA) for Microsoft Excel 
that automatically calculates the gene-stability value M for 
every control gene in a given set of samples [8]. 

Statistical analysis for qPCR 

Data from qPCR was subjected to statistical analysis of 
variance, and means were separated by LSD test at 5% level 
of significance using Statgraphics Plus 5 (Manugistics Inc., 
Rockville, MD). 

The RNA-Seq data used in this study is available at the 
NCBIs Sequence Read Achieve (http://www.ncbi.nlm.nih. 
gov/sra) with the SRA Study accession number SRX366617. 
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